Welcome to my Notebook! In this notebook I aimed to draw attractive visuals by using R and examined some important metrics, which are the key indicators of level of development. Also, I took a broader look at the general trend over years.
If you have a question or feedback, feel free to write and if you like this notebook, please do not forget to UPVOTE 🙂
Before we get started, I set up the environment. In this section, I loaded the libraries and read the data. Briefly, the dataset includes infant mortality rates (per 1,000 live births), adjusted net national income per capita (current US$) and total fertility rate (births per woman) between 1970 and 2016. Lastly, I adjusted the region names.
# Libraries
library(ggalt)
library(dplyr)
library(tidyr)
library(CGPfunctions)
library(openxlsx)
library(ggplot2)
library(ggrepel)
library(rworldmap)
library(RColorBrewer)
library(animation)
library(gridExtra)
library(magick)
library(plotly)
library(gganimate)
library(gapminder)
# Turning off warning messages
options(warn=-1)
# get the current directory
current_dir <- getwd()
# Reading xlsx document
MortalityFertilityIncome <- read.xlsx(paste0(current_dir,"/data/MortalityFertilityIncome.xlsx"))
attach(MortalityFertilityIncome)
# Changing the Region names
MortalityFertilityIncome$Region <- factor(MortalityFertilityIncome$Region,
levels = c("Europe & Central Asia",
"North America",
"Latin America & Caribbean",
"East Asia & Pacific",
"South Asia",
"Middle East & North Africa",
"Sub-Saharan Africa"),
labels = c("Europe &\nCentral Asia",
"North America",
"Latin America &\nCaribbean",
"East Asia &\nPacific",
"South Asia",
"Middle East &\nNorth Africa",
"Sub-Saharan\nAfrica"))In this section, I examined the three important indicators by using different type of charts and selected 2016 as the year but if you want to examine another year, you can do that by doing small changes in the code.
To compare infant mortality rate and income per capita among countries, I preferred a scatter plot and fitted a line by using a simple linear regression and calculated R-squared, which is defined as a statistical measure of how close the data are to the fitted regression line. It seems that the infant mortality rate and income per capita are negatively correlated.
# Calculate R2
mR2 <- summary(lm(m2016 ~ i2016 + log(i2016), data = MortalityFertilityIncome))$r.squared
mR2 <- paste0(format(mR2, digits = 2), "%")
# ggplot
p <- ggplot(MortalityFertilityIncome,
aes(x = i2016, y = m2016)) +
# Draw and color the points
geom_point(mapping = aes(color = Region),
shape = 1,
size = 4,
stroke = 1.5) +
# Draw a line fits to data
geom_smooth(mapping = aes(linetype = "r2"),
method = "lm",
formula = y ~ x + log(x), se = FALSE,
color = "red") +
# Determine the countries which will have dark labels
geom_text_repel(mapping = aes(label = Country.Name, alpha = labels),data = transform(MortalityFertilityIncome,labels = Country.Name %in% c("Turkey", "Russia", "Venezuela", "Iraq", "Mayanmar", "Sudan", "Afghanistan", "Congo", "Greece", "Argentinia", "Brazil", "India", "China", "South Africa", "Spain", "Cape Verde","Bhutan", "Rwanda", "France", "Botswana", "France", "US", "Germany", "Britain", "Barbados", "Japan", "Norway", "New Zealand", "Sigapore"))) +
# Adjusting x axis
scale_x_continuous(name = "Adjusted Net National Income Per Capita (current US$)",
limits = c(0, 70000),
breaks = seq(0,70000, by=10000)) +
# Adjusting y axis
scale_y_continuous(name = "Mortality Rate, Infant (per 1,000 live births)",
limits = c(0, 90),
breaks = seq(0,90, by = 10)) +
# Scale colors
scale_color_manual(name = "",
values = c("#BDB76B",
"#FF8C00",
"#28AADC",
"#248E84",
"#F2583F",
"#96503F",
"#24576D"),
guide = guide_legend(nrow = 1)) +
# Cancel the alpha legend
scale_alpha_discrete(guide = FALSE) +
# Adjust the R2 legend
scale_linetype(name = " ",
breaks = "r2",
labels = list(bquote(R^2==.(mR2))),
guide = guide_legend(override.aes = list(linetype = 1, size = 2, color = "red"))) +
# Plot title
ggtitle("Year: 2016") +
# Choose a theme
theme_bw() +
# Adjust the theme
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
axis.line.x = element_line(color = "gray"),
axis.text = element_text(face = "italic", size = 14),
axis.title = element_text(face = "bold",size = 16),
legend.position = "top",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.text = element_text(size = 16),
plot.title = element_text(size = 18, face = "bold"))
ggsave(p, file = paste0(current_dir,"/figures/Figure01.png"), width = 14.5, height = 8.5)
fig1 <- image_read(paste0(current_dir,"/figures/Figure01.png"))
fig1As in the previous section, I used a scatter plot to compare infant mortality and total fertility rates among countries. As opposed to the previous plot, it can be seen that the variables are positively correlated.
# Calculate R2
mR2 <- summary(lm(m2016 ~ f2016 + log(f2016), data = MortalityFertilityIncome))$r.squared
mR2 <- paste0(format(mR2, digits = 2), "%")
# ggplot
p <- ggplot(MortalityFertilityIncome,
aes(x = f2016, y = m2016)) +
# Draw and color the points
geom_point(mapping = aes(color = Region),
shape = 1,
size = 4,
stroke = 1.5) +
# Draw a line fits to data
geom_smooth(mapping = aes(linetype = "r2"),
method = "lm",
formula = y ~ x + log(x), se = FALSE,
color = "red") +
# Determine the countries which will have dark labels
geom_text_repel(mapping = aes(label = Country.Name, alpha = labels),data = transform(MortalityFertilityIncome,labels = Country.Name %in% c("Turkey", "Russia", "Venezuela", "Iraq", "Mayanmar", "Sudan", "Afghanistan", "Congo", "Greece", "Argentinia", "Brazil", "India", "China", "South Africa", "Spain", "Cape Verde","Bhutan", "Rwanda", "France", "Botswana", "France", "US", "Germany", "Britain", "Barbados", "Japan", "Norway", "New Zealand", "Sigapore"))) +
# Adjusting x axis
scale_x_continuous(name = "Fertility Rate, Total (births per woman)",
limits = c(0, 8),
breaks = seq(0,8, by=1)) +
# Adjusting y axis
scale_y_continuous(name = "Mortality Rate, Infant (per 1,000 live births)",
limits = c(0, 90),
breaks = seq(0,90, by = 10)) +
# Scale colors
scale_color_manual(name = "",
values = c("#BDB76B",
"#FF8C00",
"#28AADC",
"#248E84",
"#F2583F",
"#96503F",
"#24576D"),
guide = guide_legend(nrow = 1)) +
# Cancel the alpha legend
scale_alpha_discrete(guide = FALSE) +
# Adjust the R2 legend
scale_linetype(name = " ",
breaks = "r2",
labels = list(bquote(R^2==.(mR2))),
guide = guide_legend(override.aes = list(linetype = 1, size = 2, color = "red"))) +
# Plot title
ggtitle("Year: 2016") +
# Choose a theme
theme_bw() +
# Adjust the theme
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
axis.line.x = element_line(color = "gray"),
axis.text = element_text(face = "italic", size = 14),
axis.title = element_text(face = "bold",size = 16),
legend.position = "top",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.text = element_text(size = 16),
plot.title = element_text(size = 18, face = "bold"))
ggsave(p, file = paste0(current_dir,"/figures/Figure02.png"), width = 14.5, height = 8.5)
fig2 <- image_read(paste0(current_dir,"/figures/Figure02.png"))
fig2In order to compare the three metrics together I used 3D plot. Marker color determines the mortality rate. Red means high mortality rate and green means low mortality rate.
fig <- plot_ly(MortalityFertilityIncome, x = i2016, y = m2016, z = f2016,
hovertemplate = paste('Adjusted Net National Income Per Capita: $%{x:.2f}, <br>',
'Mortality Rate: %%{y:.2f}, <br>',
'Fertility Rate: %%{z:.2f}')) %>%
add_markers(color = m2016, colors = c("#4d934d", "#FFA500", "#8b0000")) %>%
colorbar(title = "Mortality <br>Rate")
fig %>% layout(scene = list(xaxis = list(title = "Adjusted Net National <br>Income Per Capita"),
yaxis = list(title = "Mortality Rate"),
zaxis = list(title = "Fertility Rate")))I looked at the general levels of the regions in 2016 by using boxplots. The figures show that when the income per capita is high, regions have low infant mortality and fertility rates.
m <- list(
l = 50,
r = 120,
b = 65,
t = 65,
pad = 4
)
b1 <- plot_ly(MortalityFertilityIncome, x=~Region, y=m2016, type = "box", color = Region) %>% layout(showlegend = FALSE, autosize = F, width = 930, height = 400)
b2 <- plot_ly(MortalityFertilityIncome, x=~Region, y=i2016, type = "box", color = Region) %>% layout(showlegend = FALSE,autosize = F, width = 930, height = 400)
fig3 <- subplot(b1, b2)
fig3 <- fig3%>% layout(annotations = list(
list(x = 0.02 , y = 1.1, text = "<b>Mortality Rate, Infant (per 1,000 live births)</b>", showarrow = F, xref='paper', yref='paper'),
list(x = 1 , y = 1.1, text = "<b>Adjusted Net National Income Per Capita (current US$)</b>", showarrow = F, xref='paper', yref='paper')),
margin = m
)
fig3m <- list(
l = 50,
r = 120,
b = 65,
t = 65,
pad = 4
)
b3 <- plot_ly(MortalityFertilityIncome, x=~Region, y=f2016, type = "box", color = Region) %>%
layout(annotations = list(
list(x = 0.5 , y = 1.1, text = "<b>Fertility Rate, Total (births per woman)</b>", showarrow = F, xref='paper', yref='paper')), showlegend = FALSE, autosize = F, width = 900, margin = m)
b3In the previous sections, I examined the year 2016 but what about the 1970 - 2016 period. This can be observed from the below figure. The average infant mortality and total fertility rates decrease steadily by time. On the other hand, the average income per capita rises until around 2008 and later, it fluctuates until 2016.
# Determining categories for the columns
year <- seq(1970, 2016, by=1)
years<-rep(year,3)
# Space is important for the order
mor<-rep('Mortality Rate, Infant (per 1,000 live births)', 47)
fer<-rep('Fertility Rate, Total (births per woman)', 47)
inc<-rep('Adjusted Net National Income Per Capita (current US$)', 47)
category<-c(mor,fer,inc)
# Calculating the means of the columns
m<-rep(0,141)
for(i in 1:141){
m[i]<-mean(MortalityFertilityIncome[,i+3], na.rm=TRUE)
}
# Create a dataframe
datamean=data.frame(Years=years, Means = m, Category = category)
attach(datamean)
# ggplot
p<-ggplot(datamean, aes(x = Years, y = Means)) +
# Draw the lines
geom_line(aes(color='red')) +
# Drawing charts according to their categories
facet_wrap(~Category, scales='free', ncol=2) +
# Adjusting x axis
scale_x_continuous(name = "Year") +
# Adjusting y axis
scale_y_continuous(name = "Mean")+
# Choose a theme
theme_bw() +
# Adjust the theme
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
axis.line.x = element_line(color = "gray"),
axis.text = element_text(face = "italic", size = 14),
axis.title = element_text(face = "bold",size = 16),
legend.position = "non",
strip.text = element_text(size=16),
plot.title = element_text(size = 18, face = "bold"))
ggsave(p, file = paste0(current_dir,"/figures/Figure04.png"), width = 14.5, height = 8.5)
fig4 <- image_read(paste0(current_dir,"/figures/Figure04.png"))
fig4In this part, I depicted the variables on map animations by using rworldmap and animation libraries in order to compare countries easier.
# Converting data into an object that rworldmap understands
mapped_data <- joinCountryData2Map(MortalityFertilityIncome, joinCode = "ISO3", nameJoinColumn = "Country.Code")## 214 codes from your data successfully matched countries in the map
## 3 codes from your data failed to match with a country code in the map
## 29 codes from the map weren't represented in your data
# Arrange Margins
par(mar=c(1,0,1,0),xaxs="i",yaxs="i")
# Columns
mortalityears <- c("m1970","m1975","m1980","m1985","m1990","m1995", "m2000","m2005","m2010","m2015","m2016")
incomeyears <- c("i1970","i1975","i1980","i1985","i1990","i1995", "i2000","i2005","i2010","i2015","i2016")
fertilityears <- c("f1970","f1975","f1980","f1985","f1990","f1995", "f2000", "f2005", "f2010","f2015","f2016")
years <- c("1970","1975","1980","1985","1990","1995", "2000", "2005", "2010","2015","2016")
#%%
# Identify colour palette
colourPalette <- brewer.pal(7,'Reds')
# Create the first gif
saveGIF({
for(i in 1:length(mortalityears)){
mapMortality <- mapCountryData(mapped_data,
nameColumnToPlot = mortalityears[i],
mapTitle = paste("Mortality Rate, Infant (per 1,000 live births) - ",years[i]),
colourPalette = colourPalette,
catMethod='fixedWidth',
addLegend = F,
oceanCol="lightblue",
missingCountryCol="white")
# Adding Legend
do.call(addMapLegend, c(mapMortality,
legendLabels="all",
legendWidth=0.5))
}
}, #interval = 1,
movie.name = paste0(current_dir,"/figures/Figure05.gif"), ani.width = 1000, ani.height = 600 )## [1] FALSE
fig5 <- image_read(paste0(current_dir,"/figures/Figure05.gif"))
fig5# Identify colour palette
colourPalette <- brewer.pal(7,"Greens")
# Create the second gif
saveGIF({
for(i in 1:length(incomeyears)){
mapIncome<-mapCountryData(mapped_data,
nameColumnToPlot = incomeyears[i],
mapTitle = paste("Adjusted Net National Income Per Capita (current US$) - ",years[i]),
colourPalette = colourPalette,
catMethod='fixedWidth',
addLegend = F,
oceanCol="lightblue",
missingCountryCol="white")
# Adding Legend
do.call(addMapLegend, c(mapIncome,
legendLabels="all",
legendWidth=0.5))
}
}, interval = 1, movie.name = paste0(current_dir,"/figures/Figure06.gif"), ani.width = 1000, ani.height = 600)## [1] FALSE
fig6 <- image_read(paste0(current_dir,"/figures/Figure06.gif"))
fig6#Identify colour palette
colourPalette <- brewer.pal(7,"Blues")
# Create the third gif
saveGIF({
for(i in 1:length(fertilityears)){
mapIncome<-mapCountryData(mapped_data,
nameColumnToPlot = fertilityears[i],
mapTitle = paste("Fertility Rate, Total (births per woman) - ",years[i]),
colourPalette = colourPalette,
catMethod='fixedWidth',
addLegend = F,
oceanCol="lightblue",
missingCountryCol="white")
# Adding Legend
do.call(addMapLegend, c(mapIncome,
legendLabels="all",
legendWidth=0.5))
}
}, interval = 1, movie.name = paste0(current_dir,"/figures/Figure07.gif"), ani.width = 1000, ani.height = 600 )## [1] FALSE
fig7 <- image_read(paste0(current_dir,"/figures/Figure07.gif"))
fig7gapminder is an important dataset which will help us to reach useful insights to determine key indicators of the level of development. First, I import the data and examine it.
str(gapminder)## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
head(gapminder)## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
By using a pair plot we can glance the relation among the features. This plot is often a good starting point.
pairs(gapminder)In this part, I examined the GDP Per Capita, Life Expectancy and their yearly changes. Also, we can compare the countries and continents. This animation shows the annual changes and when we hover our mouse on the points, we can see the country names. It looks like GDP Per Capita and Life Expectancy increases by time for most of the countries. Also, generally European Countries have better life ecpectancy.
df <- gapminder
fig8 <- df %>%
plot_ly(
x = ~gdpPercap,
y = ~lifeExp,
size = ~pop,
color = ~continent,
frame = ~year,
text = ~country,
hoverinfo = "text",
type = 'scatter',
mode = 'markers'
)
fig8 <- fig8 %>% layout(
xaxis = list(title = "GDP Per Capita", type = "log"),
yaxis = list(title = "Life Expentency")
)
fig8 <- fig8 %>% animation_opts(
1000, easing = "elastic", redraw = FALSE
)
fig8 <- fig8 %>% animation_button(
x = 1, xanchor = "right", y = 0, yanchor = "bottom"
)
fig8 <- fig8 %>% animation_slider(
currentvalue = list(prefix = "YEAR ", font = list(color="red"))
)
fig8As an alternative the below chart might be used. However, since this is not a dynamic chart, the knowledge you may reach will be limited.
p <- ggplot(
df,
aes(x = gdpPercap, y=lifeExp, size = pop, colour = country)
) +
geom_point(show.legend = FALSE, alpha = 0.7) +
scale_color_viridis_d() +
scale_size(range = c(2, 12)) +
scale_x_log10() +
labs(x = "GDP per capita", y = "Life expectancy")
fig9 <- p + facet_wrap(~continent) +
transition_time(year) +
labs(title = "Year: {frame_time}")
fig9